Programación de procesadores masivamente paralelos: Un enfoque práctico: Más allá de los arreglos lineales: Escalabilidad a datos multidimensionales

Bienvenido a La gran transferencia. En programación para CPU, definimos cómo iterar; en GPGPU, definimos qué se ve una iteración. Este cambio de lógica centrada en instrucciones a lógica centrada en datos está impulsado por la Abstracción de kernel.

1. El plano global

Al usar el __global__ calificador, no estás escribiendo una función: estás diseñando un plano escalable. Una sola ejecución de kernel representa una unidad independiente de trabajo, lo que permite al GPU organizar miles de tareas idénticas en su gran número de núcleos sin gestión manual de hilos.

2. El resolutor de direcciones globales

¿Cómo encuentra un solo hilo entre millones su objetivo? Utiliza un contrato determinista conocido como la fórmula de indexación:

$$\text{IDHilo} = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

Esta fórmula actúa como un sistema de coordenadas, conectando los datos lógicos del software (el arreglo) con la jerarquía física del hardware (bloques y hilos).

3. Configuración de ejecución

Los <<<B, T>>> parámetros definen la forma de la rejilla. Esto garantiza Escalabilidad transparente: tu código ejecuta la misma lógica ya sea que el hardware tenga 2 SMs o 80 SMs.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary role of the __global__ qualifier?

To define a function that runs on the CPU and is called by the GPU.

To mark a function as a kernel that is callable from the host and executes on the device.

To synchronize all threads across the entire GPU grid.

To allocate memory in the global memory space.

QUESTION 2

If blockIdx.x = 2, blockDim.x = 256, and threadIdx.x = 10, what is the global index?

266

512

522

778

QUESTION 3

What does 'Transparent Scalability' imply in CUDA?

The memory automatically scales with the size of the input array.

The same code can run on different GPUs with varying SM counts without modification.

Threads can see into the registers of other threads.

The kernel speed increases linearly with the clock speed of the CPU.

QUESTION 4

Why is the if (i < n) check necessary in a kernel?

To prevent the GPU from overheating.

To ensure threads do not access memory outside the valid array bounds.

To check if the kernel is running on the correct SM.

To synchronize memory access between threads.

QUESTION 5

Which variable represents the number of threads within a single block?

gridDim.x

blockIdx.x

blockDim.x

threadIdx.x

1. El plano __global__

2. El resolutor de direcciones globales

3. Configuración de ejecución

1. El plano global